Goto

Collaborating Authors

 performance comparison


Stable Causal Discovery via Directed Acyclic Graph Aggregation

arXiv.org Machine Learning

Directed Acyclic Graphs (DAGs) are central to uncovering causal structure in complex systems, yet learning a single DAG from data is often challenging: model uncertainty, finite samples, and a combinatorially large search space frequently yield unstable estimates. We propose DAGgr, a model averaging framework that aggregates multiple candidate DAGs into a single stable representation. Candidate graphs are weighted by their out-of-sample predictive likelihood across repeated data splits, and a thresholding rule on the resulting edge-importance scores guarantees that the aggregated graph is itself acyclic. We establish a finite-sample risk bound, prove that the procedure preserves acyclicity, and show that edge selection is consistent under mild conditions on the weights. Simulations across random, hub, and chain structures, together with an analysis of the Sachs et al. (2005) protein-signaling network, show that DAGgr matches or exceeds the best individual candidate while consistently outperforming bootstrap-aggregation baselines across structural recovery metrics.


ARelated Work

Neural Information Processing Systems

We remind important related works to understand how our AdvInfoNCE stands and its role in rich literature. Our work is related to the literature on contrastive learning-based collaborative filtering (CL-based CF) methods, and theoretical understanding of contrastive loss in collaborative filtering. A.1 Contrastive Learning-based Collaborative Filtering The latest CL-based CF methods can roughly fall into two research lines. The second category, referred to as "loss-based" approaches, mainly focuses on the modification of contrastive loss. In loss-based CF models, interacted items serve as positive instances. The prevailing augmentation-based paradigm in CL-based CF methods is to employ user-item bipartite graph augmentations to generate contrasting views. These contrasting views are then treated as positive instances in the application of contrastive loss, such as InfoNCE loss, to further enhance collaborative filtering signals.


ASelf Supervised Learning Methods

Neural Information Processing Systems

L.1 Source Dataset: ImageNet Table 13 and Table 14 describe 5-way 1-shot and 5-way 5-shot CD-FSL performance when ImageNet is used as the source dataset, respectively. Note that Table 14 is added for convenience and this is the same with Table 3 in the main paper.



0cddb777d3441326544e21b67f41bdc8-Supplemental-Conference.pdf

Neural Information Processing Systems

In this section, we prove the Theorem 2.1, which states a problem P and its' orthogonal transformed problem Q(P) = {{Qxi}Ni=1,f}have identical optimal solutions if Qis orthogonal matrix: QQT = QTQ = I. As we mentioned in Section 2.2, reward R is a function of a1:T (solution sequences), ||xi xj||i,j {1,...N} (relative distances) and f (nodes features). And Let R (P)is optimal value of problem P: i.e. Then, the remaining proof is to show Q(P)has an identical solution set with P. Let optimal solution set Π (P) = {πi(P)}Mi=1, where πi(P)indicates optimal solution of P and M is the number of heterogeneous optimal solution. Conversely, For any πi(P) Π (P), they have sample optimal value with Q(P): R(πi(P);P) = R (P) = R (Q(P)) Thus, πi(P) Π (Q(P)).




A Muon-Accelerated Algorithm for Low Separation Rank Tensor Generalized Linear Models

arXiv.org Machine Learning

Tensor-valued data arise naturally in multidimensional signal and imaging problems, such as biomedical imaging. When incorporated into generalized linear models (GLMs), naive vectorization can destroy their multi-way structure and lead to high-dimensional, ill-posed estimation. To address this challenge, Low Separation Rank (LSR) decompositions reduce model complexity by imposing low-rank multilinear structure on the coefficient tensor. A representative approach for estimating LSR-based tensor GLMs (LSR-TGLMs) is the Low Separation Rank Tensor Regression (LSRTR) algorithm, which adopts block coordinate descent and enforces orthogonality of the factor matrices through repeated QR-based projections. However, the repeated projection steps can be computationally demanding and slow convergence. Motivated by the need for scalable estimation and classification from such data, we propose LSRTR-M, which incorporates Muon (MomentUm Orthogonalized by Newton-Schulz) updates into the LSRTR framework. Specifically, LSRTR-M preserves the original block coordinate scheme while replacing the projection-based factor updates with Muon steps. Across synthetic linear, logistic, and Poisson LSR-TGLMs, LSRTR-M converges faster in both iteration count and wall-clock time, while achieving lower normalized estimation and prediction errors. On the Vessel MNIST 3D task, it further improves computational efficiency while maintaining competitive classification performance.